AITopics | multi-modal learning

Collaborating Authors

multi-modal learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d28077e5ff52034cd35b4aa15320caea-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 06:22:06 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Health Care Technology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

5aa3405a3f865c10f420a4a7b55cbff3-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 20:30:09 GMT

latent representation quality, modality, representation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Add feedback

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

Neural Information Processing SystemsDec-27-2025, 08:50:02 GMT

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter-or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- \& intra-modality modeling (I2M2) framework, which captures and integrates both the inter-and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency.

artificial intelligence, machine learning, modeling inter- & intra-modality dependency, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

What Makes Multi-Modal Learning Better than Single (Provably)

Neural Information Processing SystemsDec-24-2025, 04:17:05 GMT

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal?In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition is that the former has a more accurate estimate of the latent space representation. To the best of our knowledge, this is the first theoretical treatment to capture important qualitative phenomena observed in real multi-modal applications from the generalization perspective. Combining with experiment results, we show that multi-modal learning does possess an appealing formal guarantee.

modality, name change, provably, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization

Fernando, Heshan, Ram, Parikshit, Zhou, Yi, Dan, Soham, Samulowitz, Horst, Baracaldo, Nathalie, Chen, Tianyi

arXiv.org Artificial IntelligenceNov-11-2025

Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. We provide convergence guarantees for the proposed method, and empirical evaluations on popular MML benchmarks showcasing the improved performance of the proposed method over existing balanced MML and MOO baselines, with up to ~20x reduction in subroutine computation time. Our code is available at https://github.com/heshandevaka/MIMO.

artificial intelligence, modality, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2511.06686

Country: Europe (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

Neural Information Processing SystemsOct-10-2025, 17:29:28 GMT

We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter-and intra-modality dependencies, leading to more accurate predictions.

dataset, dependency, modality, (12 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Health Care Technology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 20:08:07 GMT

The most significant improvement was made for unimodal query when we increase the iteration number from 0 to 1 (margin of improvement: 0.034), but we still observe further gain by increasing it to 5 and 10 (margin of improvement: 0.048 and 0.05). Based on your suggestion on multi-prediction training that approximates joint likelihood, we evaluated the performance of the multimodal deep network trained jointly on $x$ and $y$ like in the original MP-DBM (i.e., randomly select subsets of variables from both data modalities $x$ and $y$ and predict them given the rest). In our preliminary results, the original MP-DBM style training jointly on $x$ and $y$ gave worse results than our proposed training scheme (i.e., predicting $x$ given $y$ and vice versa) for both multimodal and unimodal queries. We will include complete results in the revision. R38: Fine-tuning brings a significant improvement: before MDRNN fine-tuning, we obtained 0.632 and 0.521 test set mAPs for multimodal and unimodal queries, respectively, and these numbers go up to 0.686 and 0.607 mAPs after MDRNN fine-tuning.

iteration, modality, multimodal and unimodal query, (10 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

What Makes Multi-modal Learning Better than Single (Provably) Y u Huang

Neural Information Processing SystemsAug-14-2025, 16:46:51 GMT

The world provides us with data of multiple modalities.

artificial intelligence, machine learning, modality, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Add feedback

Multi-Modal Learning with Bayesian-Oriented Gradient Calibration

Guo, Peizheng, Wang, Jingyao, Guo, Huijie, Li, Jiangmeng, Sun, Chuxiong, Zheng, Changwen, Qiang, Wenwen

arXiv.org Artificial IntelligenceMay-30-2025

Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy. However, existing methods mainly aggregate gradients with fixed weights and treat all dimensions equally, overlooking the intrinsic gradient uncertainty of each modality. This may lead to (i) excessive updates in sensitive dimensions, degrading performance, and (ii) insufficient updates in less sensitive dimensions, hindering learning. To address this issue, we propose BOGC-MML, a Bayesian-Oriented Gradient Calibration method for MML to explicitly model the gradient uncertainty and guide the model optimization towards the optimal direction. Specifically, we first model each modality's gradient as a random variable and derive its probability distribution, capturing the full uncertainty in the gradient space. Then, we propose an effective method that converts the precision (inverse variance) of each gradient distribution into a scalar evidence. This evidence quantifies the confidence of each modality in every gradient dimension. Using these evidences, we explicitly quantify per-dimension uncertainties and fuse them via a reduced Dempster-Shafer rule. The resulting uncertainty-weighted aggregation produces a calibrated update direction that balances sensitivity and conservatism across dimensions. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and advantages of the proposed method.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.23071

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (0.67)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(2 more...)

Add feedback

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

Neural Information Processing SystemsMay-27-2025, 17:49:30 GMT

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- \& intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions.

modeling inter- & intra-modality dependency, multi-modal learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback